Palantir VAST 2008 Challenge

Mini Challenge 2: Migrant Boats (geo-temporal analysis)

Authors and Affiliations:

Jason Payne, Palantir Technologies, jpayne@palantirtech.com [PRIMARY contact]

Ravi Sankar, Palantir Technologies

Jason Portnoy, Palantir Technologies

Jake Solomon, Palantir Technologies

Student Team: NO

Tool(s):

For the VAST competition, the analyses were performed primarily in the Palantir Government platform and to a lesser extent in GoogleEarth and the Palantir Finance platform. Both Palantir platforms are being developed by Palantir Technologies, based in Palo Alto, California. Palantir Technologies was founded in 2004 and works with customers across the Intelligence and Finance Communities.

The development team at Palantir made the decision early in the company’s history to develop an analytic platform based on a foundation of openness; a trait not often seen in the intelligence community. As old institutions transition into a world where information is increasingly a commodity, the archaic paradigms of locking down knowledge are giving way to an environment where analysis is the real power. Palantir Technologies is able to liberate this power in several concrete ways: The first is data integration - whether structured or unstructured, Palantir provides standard and extensible interfaces for bringing information into a common environment. The second is Search and Discovery, whereby these disparate data stores can be explored as though they were one. The third is Knowledge Management in which all the knowledge that is discovered is treated like another data source so no analysis is lost. And finally, the fourth is Collaboration whereby many analysts working together can truly leverage their collective mind. Through our open APIs and numerous (and multiplying) extensibility points, Palantir has succeeded in creating a genuine platform for application-development and information-analysis.

Two Page Summary: YES (will be submitted before 18 Aug)

Answers:

Boat-1 Characterize the choice of landing sites and their evolution over the three years.

Video link:

Boat Video

Detailed Answer:

Palantir features integration with the leaders in Geospatial Information Systems (GIS), including Google Earth and ESRI’s ArcGIS. Palantir Government 2.0, which will be released between this contest’s deadline and the VAST symposium, focuses in part on increasing geospatial capabilities. Even with our current level of integration, however, this challenge is exactly the kind of open-ended, large-scale, pattern-seeking analysis that our platform was designed for. Palantir has led to deep insights into the geo-temporal migration data provided.

We began by transforming the original XML data into Palantir's open XML format (pXML) with a simple XSLT in order to feed the data into our GUI importer (figure 1). Extracting properties and relationships from either structured or unstructured datasets in Palantir requires minimal effort and time for even non-technical users. We grouped the data into launch, landing, and interdiction events with links to the passengers aboard each craft. Whenever launch data was available, we linked the launch event to the resulting landing or interdiction. To aid our temporal analysis, we made the arbitrary assumption that "Go Fasts" launched a day before interdiction/landing, rustics 2 days before, and rafts 3 days before.

Figure 1: Data Import

The flexibility of Palantir’s platform allowed us to analyze the migration data from many dimensions. Our team was able to investigate landing sites in relation to time, type of craft, launch sites, and success rate. With a quick filter, we brought up all the landing events in our database and displayed them in the Graph(figure 2.1). A tap of ‘Ctrl-A’ and ‘Ctrl-G’ is all it takes to export the events to Google Earth (figure 2.2).

Figure 2.1: All landing events in the Palantir Graph Figure 2.2: The same events in Google Earth

We briefly explored the map and manipulated the time window to get an overview of the situation. Immediately, we noticed that the landings started around the Florida Keys, later spreading much farther north. On closer inspection, we caught some landings on the Yucatan Peninsula of Mexico. The landings were chunked into four zones: South / East / West Florida and Mexico. We used the geography of Florida to create latitude/longitude zones that would designate our landing regions: 26.15N marked the end of “South Florida” and a longitude line designated the east-west border (-81.5W) (figure 3.1). We converted these coordinates into filters in Palantir and easily divided the landing events into multiple groups (figure 3.2).

Figures 3.1 and 3.2: Landing zones, geospatially (left), arranged in Palantir (right)

Figures 4.1 and 4.2: Landings, one year time slice, grouped (left) and one month, expanded (right)

By scrolling through the Timeline in Palantir (figures 4.1 and 4.2), manipulating data in Google Earth, and doing basic statistical analysis on numbers from the Histogram tool, we reconstructed the following history: Migrants began escaping from Isla del Sueño to the nearby tip of Florida in early 2005, and it remained a common landing site throughout the three year period. In early 2006, they began landing in West Florida and this site gained popularity in mid 2006. Mexico followed in mid 2006, becoming the third landing site that migrants chose, but it quickly reached a high activity level despite the distance. In fact, for the entire three year time period, the greatest proportion of successful landings were on the Yucatan Peninsula—probably because U.S. Coast Guards don’t patrol Mexican waters. Finally, in early 2007, migrants began to land off the eastern coast of Florida; however, this site never reached the numbers of landings seen in the other three zones. Overall, roughly 43% of successful landings were in Mexico, 33% in South Florida, 17% in West Florida, and fewer than 7% in East Florida.

The landing sites could often be characterized as hosting dense populations—perhaps density makes it easier to blend in or is simply not a concern for migrants. 65% of all voyages employed rustic boats, 20% rafts, and 15% “Go Fasts” (percentages reported are rounded). A relationship between boat choice and final destination is present but weak: rustics were more popular among new West Floridians (70%) than South Floridians (61%); “Go Fasts” were somewhat less common a choice for would-be-East Floridians (5% below average); and rafts were both unusually unpopular for those heading to the West (8% below average) and unusually popular on the East (7% above average). Boat-type choice had almost no effect on the overall landing success rate, which is extremely close to 48% for all three vehicles. The less stable monthly success rates began low (25% or below), rose well above 60%, and dropped to 0% again before reaching a relatively consistent level of 40-70% in the last year and a half (see the chart in short answer 3). Trips to South Florida and Mexico produced 7% more fatality-free journeys than those to West Florida. In sum, we revealed a huge number of approaches to analyzing landing sites simply by having Palantir sort landing events into zones and looking into the resulting aggregates, conveniently presented by the Histogram (figure 5).

Figure 5: Histograms for landings in Mexico (left) and East Florida (right)

Our most interesting analysis track viewed the landing data with respect to its associated launch data. We created a new investigation in Palantir and added all launch events (figure 6.1).

Figure 6.1: All launch events in Palantir, divided by launch site (picture labeled externally)

After exporting these to Google Earth we noticed that four very distinct portions of the island were being used for launches. On closer inspection, we decided to divide the northwestern region into two sites, as the coast juts out at one point and creates two distinct inlets (figure 6.2).

Figure 6.2: The launch sites, numbered

Using the same basic workflow as we used for the landings, we created geographic filters in Palantir to divide the launch events into groups. The launch data is incomplete but very interesting nonetheless. The very first launch took place from site 5; and sites 1, 4, and 5 were the most popular for the first two years. In the third year, activity exploded across the launch sites. With this background information obtained, we asked Palantir to do a link-by search and bring in all landing events that share an “appears in” relationship with our launches. Then we applied the filters we had created to designate landing zones (figure 7) to see if there was any correlation between launch sites and landing zones.

Figure 7: All launches from site 5 that landed in South Florida

Using the Histogram from this dataset, we uncovered a rather strong link between launch site and the eventual landing site of successful boats (figure 8).

Figure 8: Relationship between launch site and destination

Importantly, even when we calculated this relationship in terms of all boats leaving the site rather than only the successful ones, the site preferences remained the same. This insight could be used to improve Coast Guard interdiction rates: if info suggests that a boat has left from the southeast (site 5) of Isla del Sueño, they’re probably heading for Mexico; however, if they’re leaving from its northernmost point (site 3), watch the eastern approach to Florida. Moreover, the various launch sites have very different success rates, with boats from site 4 being the most likely to land (58%) and those from site 3 the least likely (31%). With the unique power of Palantir, this kind of complex, multi-dimensional analysis is not just possible but intuitive and easy to perform.

We also looked at the Coast guard ships that interdicted vessels from the island. We found that the USS Ironwood had the most interdictions at 26, and the USS Bold Reef, had the least with 13 interdictions.

Figure 9: Coast Guard Vessels surrounded by their interdictions

Lastly, we analyzed the rosters of the vessels. With the Histogram, we noticed that some of the names appeared in multiple voyages, and 2 jumped out at us: Jesus Vidro and Eduardo Catalano. They traveled together on two interdicted voyages, before finally landing successfully.

Figure 10: Shared travel of Vidro and Catalano

Boat-2 Characterize the geographical patterns of interdiction over the three years.

Short Answer:

We exported interdictions from Palantir to Google Earth and created an axis using Lat x and Long y. With our geospatial animation, we then constructed a four-stage progression out of Coast Guard interdictions: encounters begin at Florida’s tip and north of Isla del Sueño, and through 2005 they spread horizontally and southward. In mid-2006 they gain a northward component, but this spreading pattern is slowed in early 2007 when interdictions shift southwest. Finally, interdictions push east to surround the island later that year. We also noted the dissolution of several “boundaries”: the first encounter to clear the island’s southern tip occurs in May 2006. An encounter first crosses our “north” line in October 2005 on the east but not until June 2006 on the west. Throughout these stages, interdictions never reach as far north as landings or cover Mexico (perhaps a jurisdictional problem)—a critical strategic weaknesses to address.

Figure 1: Interdictions during early 2005

Figure 2: Interdictions grouped by boat with all boats during 2006’s peak period selected (note: the close-up was interposed onto the picture)

Boat-3 What is the successful landing rate over the time period?

Short Answer:

We began by adding all interdictions and landings to the Graph and viewing the events in Palantir’s Histogram. The Histogram reveals that there are 917 events, 441 of which are landings and the rest of which are interdictions. Assuming that there are no vessels lost at sea [i.e., every voyage ends in a logged landing or interdiction], the overall landing success rate is 48.09%. We can also find much more granular success rates with ease. Activating a temporal filter on the Timeline (for example, March 2006) will highlight and Histogram only events during that time period. Thus, we can see that there are five landings and thirteen interdictions that month for a landing rate rate of slightly less than 28%. The yearly landing rates are roughly 32%, 39%, and 53% for 2005, 2006, and 2007 respectively—a seemingly steady rise that is much more erratic in reality. We also exported the data from Palantir Government and imported into Palantir Finance for advanced time series analysis as seen in figure 1. It appears that the rustic arrival rate is the most stable, and the go fast rate is the most erratic over time.

Figure 1: Month-by-month chart of landing success rates in Palantir Finance

Figure 2: An example of the one-month temporal filters used to create the chart above